Fix is_zero() for FixedSizeList and Struct scalar types#8500
Conversation
`Scalar::is_zero` previously only checked the element/field count for `FixedSizeList` and `Struct` values, so any non-null struct or fixed-size list was reported as zero. Recurse into the elements/fields and require every one to be a non-null zero value, matching the documented `Scalar::zero_value` semantics. Add regression tests covering non-zero elements/fields, null elements/fields, and a nested struct-of-fixed-size-list value. These fail against the previous element-count-only behavior. Closes: #7891 Signed-off-by: Connor Tsui <connor@spiraldb.com> Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_0139i5huJ9ixt2m3LggBoUKQ
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | decompress_rd[f64, (10000, 0.01)] |
108.7 µs | 139 µs | -21.83% |
| ❌ | Simulation | decompress_rd[f64, (10000, 0.1)] |
109 µs | 139.4 µs | -21.81% |
| ❌ | Simulation | decompress_rd[f64, (10000, 0.0)] |
108.7 µs | 139 µs | -21.79% |
| ❌ | Simulation | decompress_rd[f32, (100000, 0.0)] |
496 µs | 583.7 µs | -15.03% |
| ❌ | Simulation | decompress_rd[f32, (10000, 0.1)] |
78.1 µs | 90.6 µs | -13.8% |
| ❌ | Simulation | decompress_rd[f32, (10000, 0.01)] |
78.1 µs | 90.6 µs | -13.79% |
| ❌ | Simulation | decompress_rd[f32, (10000, 0.0)] |
78.5 µs | 91.1 µs | -13.75% |
| ⚡ | Simulation | chunked_bool_canonical_into[(1000, 10)] |
32 µs | 16.6 µs | +92.19% |
| ⚡ | Simulation | decompress_rd[f64, (100000, 0.1)] |
1,020.7 µs | 842.7 µs | +21.13% |
| ⚡ | Simulation | decompress_rd[f64, (100000, 0.01)] |
1,020.7 µs | 842.7 µs | +21.13% |
| ⚡ | Simulation | decompress_rd[f32, (100000, 0.1)] |
582.9 µs | 495.3 µs | +17.69% |
| ⚡ | Simulation | decompress_rd[f32, (100000, 0.01)] |
582.9 µs | 495.3 µs | +17.69% |
| ⚡ | Simulation | eq_i64_constant |
318 µs | 288.2 µs | +10.34% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing claude/serene-darwin-ne73a3 (1abc9e3) with develop (6a3fb19)
| // A fixed-size list is zero only if it has the expected number of elements and every | ||
| // element is itself a non-null zero value. | ||
| DType::FixedSizeList(_, list_size, _) => { | ||
| let list = self.as_list(); | ||
| list.len() == *list_size as usize | ||
| && (0..list.len()) | ||
| .all(|i| list.element(i).is_some_and(|e| e.is_zero() == Some(true))) | ||
| } | ||
| // A struct is zero only if every one of its fields is itself a non-null zero value. | ||
| DType::Struct(..) => self | ||
| .as_struct() | ||
| .fields_iter() | ||
| .is_some_and(|mut fields| fields.all(|f| f.is_zero() == Some(true))), |
There was a problem hiding this comment.
The definitions here seem really weird to me, I think its worth documenting the "why" here.
How do other systems handle this?
There was a problem hiding this comment.
I'm not sure why they seem weird? I feel like this is the only possible correct definition for these scalars.
There was a problem hiding this comment.
In my mind a "zero" struct is an empty struct (as in - 0 fields) of length 0
There was a problem hiding this comment.
But this is a scalar, which means it's already length 1. And I would think of this as is_default almost where we do not have control over the struct fields, we just call is_default on all fields (via auto-derived Default impl of the struct).
There was a problem hiding this comment.
More importantly, we have to implement the functions in vortex-array/src/scalar/scalar_value.rs somehow based on the dtype, and the is_ functions have to mirror that.
Summary
Closes: #7891
Fixes incorrect
is_zero()implementations forFixedSizeListandStructscalar types that were only checking the number of elements/fields rather than recursively validating that all contained values are themselves zero.